Approximate pattern matching with k-mismatches in packed text
نویسندگان
چکیده
Given strings P of length m and T of length n over an alphabet of size σ, the string matching with k-mismatches problem is to find the positions of all the substrings in T that are at Hamming distance at most k from P . If T can be read only one character at the time the best known bounds are O(n √ k log k) and O(n+ n √ k/w log k) in the word-RAM model with word length w. In the RAM models (including AC and word-RAM) it is possible to read up to ⌊w/ log σ⌋ characters in constant time if the characters of T are encoded using ⌈log σ⌉ bits. The only solution for k-mismatches in packed text works in O((n log σ/ log n)⌈m log(k+log n/ log σ)/w⌉+n) time, for any ε > 0. We present an algorithm that runs in time O( n ⌊w/(m log σ)⌋ (1 + logmin(k, σ) logm/ log σ)) in the AC model if m = O(w/ log σ) and T is given packed. We also describe a simpler variant that runs in time O( n ⌊w/(m log σ)⌋ logmin(m, logw/ log σ)) in the word-RAMmodel. The algorithms improve the existing bound for w = Ω(log n), for any ǫ > 0. Based on the introduced technique, we present algorithms for several other approximate matching problems.
منابع مشابه
A Parallel Algorithm for Fixed-Length Approximate String-Matching with k-mismatches
This paper deals with the approximate string-matching problem with Hamming distance. The approximate string-matching with kmismatches problem is to find all locations at which a query of length m matches a factor of a text of length n with k or fewer mismatches. The approximate string-matching algorithms have both pleasing theoretical features, as well as direct applications, especially in comp...
متن کاملOn string matching with k mismatches
In this paper we consider several variants of the pattern matching problem. In particular, we investigate the following problems: 1) Pattern matching with k mismatches; 2) Approximate counting of mismatches; and 3) Pattern matching with mismatches. The distance metric used is the Hamming distance. We present some novel algorithms and techniques for solving these problems. Both deterministic and...
متن کاملExact and Approximate Two Dimensional Pattern Matching allowing Rotations
We give fast ltering algorithms for searching a 2{dimensional pattern in a 2{dimensional text allowing any rotation of the pattern. We consider the cases of exact and approximate matching under several matching models, improving the previous results. For a text of size n n character and a pattern of size m m characters, the exact matching takes average time O(n 2 =m). If we allow k{mismatches o...
متن کاملApproximate Boyer-Moore String Matching
The Boyer-Moore idea applied in exact string matching is generalized to approximate string matching. Two versions of the problem are considered. The k mismatches problem is to find all approximate occurrences of a pattern string (length m) in a text string (length n) with at most k mismatches. Our generalized Boyer-Moore algorithm is shown (under a mild independence assumption) to solve the pro...
متن کاملApproximate String Matching by Finite Automata
Abs t r ac t . Approximate string matching is a sequential problem and therefore it is possible to solve it using finite automata. A nondeterministic finite automaton is constructed for string matching with k mismatches. It is shown, how "dynamic programming" and "shift-and" based algorithms simulate this nondeterministic finite automaton. The corresponding deterministic finite automaton have O...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Inf. Process. Lett.
دوره 113 شماره
صفحات -
تاریخ انتشار 2013